The 2019 - 2020 Coronavirus pandemic will be one of the most influential health crisis of the 21st century. COVID-19, the strain responsible for the pandemic, is a new strain of coronavirus, not previously identified in humans, causing respiratory stress and disease. The health impact continues of the virus is of great concern worldwide. However, the economic impact of the COVID-19 pandemic will continue to be felt for months after the virus has gone. Over the past several weeks, the global share market has dropped, businesses have closed, and governments have introduced totalitarian social distancing and lockdown laws.
In this project, the impact of government responses and indicators on the coronavirus pandemic is explored. The following visualisations explore the fundamental trends of the coronavirus pandemic, the different lockdown measures that have been introduced and how government indicators have influenced the pandemic.
Below are links to Jupyter Notebooks containing the explanations for the visualisations presented in this website, the datasets utilised and the original code used to create the visualisations.
Explainer Notebook: https://ldorigo.github.io/visualisation_project/explainer.html Code: https://ldorigo.github.io/visualisation_project/code.html Datasets: https://drive.google.com/open?id=19Q156Mwic7mMJDArwol_BSnOoVu_T8O3
Before you continue through to the visualisations, first, there are a few crucial points you should be aware of regarding the use of the website. Down the bottom right-hand corner of the website, you will see that there is a set of arrows. These are used to navigate through the pages of the website. Alternatively, you can use the arrows on the keyboard of your computer if available. To move between visualisations, please use the right arrow. For each visualisation, there are three parts: the introduction, visualisation and conclusion. Use the right arrow to move between visualisations and the down arrow to move between the introduction, visualisation and conclusion. You can use the up and left arrows at any time to go back to previous sections of the website.
The first visualisation shows an overview of the number of COVID-19 cases in all countries at a given date. The slider down the bottom of the plot can be used to vary the data and show it for different dates. Dragging the slider shows the development of pandemic. Hovering over a country will show the total number of cases at the chosen date.
plotly.offline.init_notebook_mode()
Create logarithmic colour scale:
plotly.offline.iplot(fig)
The general trend shows the outbreak firstly isolated in China, before spreading to Europe and North America. From the visualisation, we can see that the number of cases has increased at the greatest rate over the last month shown in the visualisation. The plot also shows that while some countries have over 1 million cases, others are still waiting for larger numbers of the population to be infected, for example, countries in Africa. What is now interesting to visualise is the overall statistics of the number of cases.
The following plot shows the general evolution of the number of confirmed COVID-19 cases, COVID-19 related deaths and recovered patients. For each category of patient, two trends are presented, an exponential trend and a logarithmic trend.
x_ticks = []
xticklabels = []
i = 0
for col in cases_data:
if (i % 15 == 0):
x_ticks.append(col)
xticklabels.append(col.strftime("%d/%m"))
i = i + 1
Initialise the figure and different subplots:
fig, axs = plt.subplots(3,
2,
figsize=(12, 8),
dpi=100,
sharex='col',
gridspec_kw={
'hspace': 0.2,
'wspace': 0.15
});
(ax1, ax2), (ax3, ax4), (ax5, ax6) = axs;
Plot each of the subplots. Within this code, use the .sum() function to count the number of cases per date in the dataset to allow this information to be ploted:
ax1.plot(cases_data.columns, cases_data.sum(), 'tab:orange')
ax2.plot(cases_data.columns, cases_data.sum(), 'tab:orange')
ax3.plot(deaths_data.columns, deaths_data.sum(), 'tab:red')
ax4.plot(deaths_data.columns, deaths_data.sum(), 'tab:red')
ax5.plot(recovered_data.columns, recovered_data.sum(), 'tab:green')
ax6.plot(recovered_data.columns, recovered_data.sum(), 'tab:green')
[<matplotlib.lines.Line2D at 0x1a1c68f278>]
Add titles to the plot:
fig.suptitle('COVID-19 Cases Summary', fontsize=24)
ax1.set(title="Confirmed Cases - Exponential")
ax2.set(title="Confirmed Cases - Logarithmic", yscale="log")
ax3.set(title="Total Deaths - Exponential")
ax4.set(title="Total Deaths - Logarithmic", yscale="log")
ax5.set(title="Recovered Patients - Exponential")
ax6.set(title="Recovered Patients - Logarithmic", yscale="log")
[None, Text(0.5, 1, 'Recovered Patients - Logarithmic')]
Set the size of the subplots:
ax1.title.set_size(18)
ax2.title.set_size(18)
ax3.title.set_size(18)
ax4.title.set_size(18)
ax5.title.set_size(18)
ax6.title.set_size(18)
Add the tick labels:
for ax in axs.flat:
ax.set_xticks(x_ticks)
ax.set_xticklabels(xticklabels)
# ax.tick_params('x', rotation = 40)
ax.tick_params('y', labelsize='small')
for tick in ax.xaxis.get_major_ticks():
tick.label.set_fontsize(12)
for tick in ax.yaxis.get_major_ticks():
tick.label.set_fontsize(12)
Add two main axis labels
display(fig)
The cases summary visualisation shows the exponential and logarithmic growth of each category of COVID-19 Case. When looking at the exponential trends, we can see that the steep exponential increase in the number of cases began around the 22nd of March 2020. This trend occurred over a month after the World Health Organisation declared the pandemic on the 11th of February 2020. Comparatively, when we analyse the logarithmic trend plot, we can see that a carrying capacity (observed maximum number of cases) was observed around the 21st of February. However, this was not the true carrying capacity as the pandemic continued to evolve and multiply after this date which, is where the second increase in cases is observed.
An epidemic or pandemic curve is a form of statistical chart which is used to visualise the evolution of a disease or virus outbreak. This form of visualisation can be used to map the different stages of a pandemic but also determine when different stages of a disease or virus outbreak occur. The term "flatten the curve" has been heard continuously over the last few months during the COVID-19 pandemic. What this refers to is the pandemic curve. Authorities want to bring the peak of the curve down, so the health care systems worldwide are not overwhelmed. For this visualisation, only specific countries have been analysed; these include Australia, China, Denmark, France, India, Iran, Italy, Mexico, Sweden and the USA. An important point to note is that the pandemic curve represents the number of cases on a given day and does not account for any recovered patients or deaths. The data here is also shown as the number of cases per population of the above countries. You will be able to see the fraction of the population that has been infected and compare this to other countries. For this visualisation, you may choose the countries you wish to visualise by clicking and unclicking the country abbreviation on the left-hand side of the visualisation.
To see the true shape of the pandemic curve, only the number of active cases per day is visualized. Currently, the dataset in its original format is cumulative, which, is not suitable if we only want to show the number of active cases per day. Therefore, the deaths and recovered datasets need to be subtracted from the cases dataset.
# calculate the daily number of active cases
daily_cases_sub_recovered = cases_data.subtract(recovered_data)
daily_cases_data = daily_cases_sub_recovered.subtract(deaths_data)
Due to a large number of countries in the datasets, for the pandemic curve plot, only specific focus countries were analyzed. These included Australia, China, Denmark, France, India, Iran, Italy, Mexico, Sweden and the USA.
# initialise focus countries
focus_countries = set(
["AUS", "CHN", "DNK", "FRA", "IND", "IRN", "ITA", "MEX", "SWE", "USA"])
# separate the focus countries from the data
data_focused = daily_cases_data.loc[focus_countries]
data_focused = data_focused.T
pops = worldbank_data.loc[focus_countries, 'population']
norm_data = data_focused / pops
Convert the Pandas Dataframe to Bokeh ColumnDataSource:
source = ColumnDataSource(norm_data)
Intialise a string of x range values, in this case, hours of the day:
hours = [str(i) for i in norm_data.index.values]
Create the Bokeh plot:
Show the plot:
show(p)
From this visualisation, we can see that every country is at a different stage of the pandemic. Some countries, for example, Australia and China, have successfully flattened the curve compared to countries like Italy and the USA where a large percentage of the population has been infected with the virus. From the plot, we can see that the virus in China and Australia has successfully been contained and these countries are entering the final stages of the pandemic as signified by the dropping off of their pandemic curve. Comparatively, for countries like the USA, Sweden and France, the pandemic is still in the beginning stages as the number of cases continues to climb. For some countries, their pandemic curve follows a clear bell curve trend. While for others like France and Denmark, the curve is a bit more jagged. These trends are all related to the types of government measures that have been introduced, which are explored in the following visualisation.
To get an overview of how different countries reacted to the outbreak, we have assembled a visualisation that shows the specific measures implemented by each country, plotted by the time since outbreak and number of cases at the time of implementation. You can select different categories and obtain more information on specific measures by hovering them.
To be able to show the time it took for each government to implement specific policies, we first need to compute the start data of the pandemic in each country. We set this to be the day at which the country has more than 50 cases: although arbitrary, this number is a good compromise between not leaving too many countries out (i.e., countries with very few cases) and not having too much noise (as countries with extremely few cases are often outliers and make the data messy).
Start building a dataframes with the data we need:
df_responses_in_time = pd.DataFrame(
index=significative_cases_data.index
) # these are the ISO codes for the countries we're interested in
df_responses_in_time = df_responses_in_time.assign(
days_before_first_case=firstdays)
firstday_dates = pd.Series(pd.to_datetime(['22-01-2020'])).repeat(
len(df_responses_in_time))
df_responses_in_time = df_responses_in_time.assign(
initial_day=firstday_dates.values)
Drop rows that contain NaNs. This makes it easier to plot, and while we lose some data, we're only interested in trends, so it shouldn't matter:
We now need to merge (join) the "government measures" data to our dataframe:
Compute the number of days that passed between the start of the outbreak and the implementation of each measure (in a specific country):
Again, remove NaNs to make plotting easier:
Define what should be displayed on hover:
We generate many colors to represent different countries. The colors don't carry any meaning, it's just to make it easier to distinguish countries.
Initialize bokeh datasource:
Initialize the figure:
Generate plots for all types of measures:
Set plot trimmings:
Add hover tool to the plot:
show(measures_plot)
What is interesting in this visualisation is that many countries have introduced measures over time, rather than all at once. We can see that each time a new measure is introduced, the cases at the time the measure was introduced has increased from the last time the government introduced a measure. What this trend suggests is that the governments are actively monitoring the situation and are responding to the growing number of cases with stricter measures in an attempt to curve the spread of the virus. There are many different reasons why different measures are introduced; however questions like have some countries been quicker to put the country into lockdown because they don’t have the health system capacity, are interesting to explore. The following plots explore the government measures and how the healthcare system may have contributed to the different government decisions regarding the pandemic.
To start the more analytical part of this investigation, let's look at how countries' GDP and expenditure on healthcare influence the spread of the disease. Notice the red line, which represents the linear regression of the number of cases (adjusted by population) on both of the indicators mentioned above. Hovering over a point in this visualisation will reveal the country.
We get both the GDP and Healthcare expenditure data directly from the worldbank dataset:
Remove the country San Marino - it's hardly a country, and skews all of the other data:
Compute the total amount of cases of coronavirus in each countries. Since the COVID dataset we have is cumulative, this is simply the amount of cases on the latest date in the dataset:
Yet again, drop rows that contain NaNs to make plotting possible:
Initialize the bokeh data source and the hover tooltips:
For each of the indicators, we plot all the countries and fit a regression line on the indicator.
for i, (title, plot, indicator) in enumerate(zip(titles, plots, indicators)):
plot.circle(x=indicator,
y='cases_per_capita',
source=source,
size=10,
fill_alpha=0.6)
X = data_complete[indicator].values.reshape(-1, 1)
y = data_complete.cases_per_capita
reg = LinearRegression().fit(X, y)
y_predicted = reg.predict(data_complete[indicator].values.reshape(-1, 1))
score = reg.score(X, y)
plot.line(data_complete[indicator],
y_predicted,
color='red',
legend_label="R^2: {}".format(score))
plot.xaxis.axis_label = title
if i == 0:
plot.yaxis.axis_label = "Cases (adjusted by population)"
plot.title.text_font_size = fontsize_titles_px
plot.xaxis.axis_label_text_font_size = fontsize_labels_px
plot.yaxis.axis_label_text_font_size = fontsize_labels_px
health_plot.yaxis.major_label_text_font_size = "0pt"
Format the plot with a title:
titlediv = row(Div(
text=
"<h1 style='text-align: center'> Influences of GDP and Healthcare Expenditure on COVID19 cases</h1>"
),
sizing_mode='scale_width')
plots = gridplot([[gdp_plot, health_plot]])
health_plot = column([titlediv, plots])
Although not shown in the visualizations, out of curiosity, we try multiple regression to see if using both indicators is better for predicting the number of cases:
show(health_plot)
Surprisingly, it appears that greater GDPs and greater expenditures on healthcare are positively correlated with the number of cases in the country. Countries like the USA and Luxembourg have a greater GDP and spend more on healthcare; however, have a large number of cases per population. This trend is likely to be influenced by the fact that the coronavirus spread initially through many wealthy countries in Europe and thus they have been experiencing the pandemic for an extended period which, results in an increase in cases. However, this could also be explained by government stances relating to healthcare. For countries with better healthcare systems, maybe the governments felt better poised to treat patients with the virus and thus waited longer to put the country into lockdown. This is in comparison to poorer countries who may have gone straight into lockdown. The next visualisation explores this trend.
We now want to look at how the capacity of a countries healthcare system impacted the speed of government responses. Questions like "how does the strength of a country's healthcare system influence how quickly that country responds to the pandemic?" are to be answered. Here, we can see how the healthcare coverage index is related to the time that it took for the government to implement measures that correspond to 50 or more on the Oxford Stringency Index. The Oxford stringency index quantifies the different government measures, how strict they are and how well the government has responded. An important note here is that the size of a countries "bubble" on this plot is related to the capacity of the healthcare system. Hovering over a point in this visualisation will reveal the country and numerical information.
We start by only keeping rows in the worldbank data that have the info we need:
We then merge that with the Oxford Stringency Index dataset:
As before, remove rows with NaNs:
To make it a little easier to handle, we only keep the columns we need:
Normalize the number of beds per 10.000 people so it renders nicely:
beds = dataset_small['hospital_beds_per_1000']
beds = 0.3 + (beds / (max(beds))) * 2
dataset_small['beds_normalized'] = beds
show(speed_of_reaction_plot)
The idea behind this plot was to see if countries with a powerful healthcare system (and with a lot of hospital beds) could "afford" to wait longer before implementing strong measures. And, indeed, this appears to be the case: the time it took for countries to enforce stringent policies seems to increase with the healthcare coverage index, and countries with many hospital beds per people ("big bubbles") tend to be on the higher part of the plot.
Note that as always, there may be many confounding factors at play. For instance, the coronavirus hit Europe first, and Europe has many of the most developed healthcare systems in the world, meaning that countries with "better healthcare" may have had less time overall to plan and implement policies. Other government factors may have also come into play, which, is explored in the next visualisation.
Let's look at the influence that various political and economic indicators have on a country's ability to face the pandemic. On the horizontal axis, we have different indicators that represent multiple aspects of a country's government. On the vertical axis, we have, for various ranges of each indicator, the proportion of countries in that range that have succeeded in curbing the pandemic - i.e., countries whose pandemic curve has reached a peak.
For this visualisations, we need to compute the countries whose outbreak has reached a peak. We do this using scipy's built-in function "find_peaks", as well as with some custom logic to check if the country has not peaked yet (as that is not detected by scipy)
days_before_peak = {}
for cc in countrycodes:
try:
testseries = daily_cases_data.loc[cc]
except KeyError:
continue
try:
tentative_peak = find_peaks(testseries, distance=1000)[0][
0] # distance=1000 to only return the highst peak.)
except IndexError:
# If no peaks are found, the curve hasn't reached a peak at all:
peak = len(testseries)
has_peaked = False
# If the curve hasn't reached a peak yet,the function returns a wrong value.
# we just check if the latest value is higher than the found peak
if testseries[-1] > testseries[tentative_peak]:
# subtract the days until outbreak to get the actual speed in that country:
peak = len(testseries)
has_peaked = False
else:
peak = tentative_peak
has_peaked = True
try:
firstday = firstdays_dict[cc]
except KeyError:
continue
days_before_peak[cc] = [peak - firstday, has_peaked]
peaks_df = pd.DataFrame(days_before_peak.values(),
index=days_before_peak.keys())
peaks_df.columns = ['days_before_peak', 'has_peaked']
peaks_and_wb = peaks_df.merge(worldbank_data,
left_index=True,
right_index=True,
how='inner')
Choose some indicators that we want to investigate:
indicators = [
"political_stability", "government_effectiveness",
"voice_and_accountability", 'rule_of_law',
'self_payed_health_expenditure_percent_of_total', "freedom_score",
"corruption_control", "regulatory_quality"
]
titles = [
'Political Stability', 'Government Effectiveness',
'Voice and Accountability', 'Rule of Law',
'Percentage of Self-Payed Healthcare Expenditure', 'Freedom of Press',
'Corruption Control', 'Regulatory Quality'
]
Make a small "histogram" for each subplot. Note that here the bar height doesn't correspond to frequency, as in a normal histogram, but rather to the proportion of countries in that bin who have successfully curbed the epidemic:
figures = []
for index, indicator in enumerate(indicators):
vrange = np.linspace(peaks_and_wb[indicator].min(),
peaks_and_wb[indicator].max(),
num=10)
dist = vrange[2] - vrange[1]
middlepoints = [(vrange[i] + vrange[i + 1]) / 2
for i in range(len(vrange) - 1)]
averages = []
in_bin = []
amount_in_bin = []
for i in range(len(vrange) - 1):
start = vrange[i]
stop = vrange[i + 1]
countries = peaks_and_wb[(peaks_and_wb[indicator] >= start)
& (peaks_and_wb[indicator] < stop)]
in_bin.append(", ".join(list(countries.country)))
if len(countries) == 0:
averages.append(0)
continue
count = 0
for c in countries.iterrows():
if c[1].has_peaked:
count += 1
avg = count / len(countries)
averages.append(avg)
TOOLTIPS = [("Countries in this bin", "@countries_peaked"),
(indicator, "@midpoints"),
("Proportion of countries that have reached a peak",
"@proportion_peaked")]
bin_df = pd.DataFrame()
bin_df['midpoints'] = middlepoints
bin_df['proportion_peaked'] = averages
bin_df['countries_peaked'] = in_bin
source_bin = ColumnDataSource(bin_df)
binary_plot = figure(tooltips=TOOLTIPS,
y_range=(0,1),
width=250,
height=160,
title=titles[index])
# for mp, av in zip(middlepoints, averages):
binary_plot.vbar(source=source_bin,
x="midpoints",
top="proportion_peaked",
width=dist - 0.05 * dist)
binary_plot.xaxis.axis_label_text_font_size = fontsize_labels_px
binary_plot.yaxis.axis_label_text_font_size = fontsize_labels_px
figures.append(binary_plot)
Add a title to the subplots:
show(grid)
Some of the trends were to be expected: for instance, the proportion of countries that managed to control the epidemic is greater for countries whose "Government effectiveness" and "Rule of law" are high. On the other hand, some trends are quite surprising and interesting: "Voice and Accountability" (which is a measure of democracy) has two peaks on both sides of the spectrum. Our interpretation is that countries with strong, totalitarian governments (the peak on the left of the graph), where a citizen has little choice but to obey the government's decisions, are better able to react swiftly to a crisis like this than weak democracies (the valley in the middle of the graph). On the other hand, strong democracies (the peak on the right of the graph) mostly correspond to wealthy European countries, which are better able to face the crisis due to many other reasons.
What is now interesting to analyse is how the stringency indexes have evolved and their impact on the pandemic curve, since we now understand some of the factors that have influenced government measures.
As we explored how the pandemic spread across the world, we will now take a look into how the government measures have changed over time. This is represented using the oxford stringency index. The slider down the bottom of the plot can be used to vary the data and show it for different dates. Dragging the slider shows the change in stringency index as the pandemic evolved. Hovering over a country will provide further numerical information. This visualisation is based off the visualisations presented by the University of Oxford [4].
Import the dataset and remove the relevant columns and store in a new dataframe
stringency = pd.read_csv("../datasets/OxCGRT_latest.csv", )
stringency_df = pd.DataFrame(
stringency,
columns=['CountryName', 'CountryCode', 'Date', 'StringencyIndex'])
Some countries do not have complete data for the date range 01/01/2020 to the 24/04/2020 sthey are removed here:
stringency_df = stringency_df[~stringency_df.CountryCode.str.contains("CPV")]
stringency_df = stringency_df[~stringency_df.CountryCode.str.contains("LSO")]
stringency_df = stringency_df[~stringency_df.CountryCode.str.contains("MAC")]
Initially, the dataset listed the stringency index per day per country as individual rows within the dataset. The dataset was manipulated to have the countries along with the y index of the dataset and the date along with the top index. This allowed for the data to be presented as a large matrix and is easier to plot.
stringency_df.drop(stringency_df.tail(1).index, inplace=True)
stringency_df["StringencyIndex"] = pd.to_numeric(
stringency_df["StringencyIndex"])
stringency_df2 = stringency_df
stringency_df2.set_index(['CountryCode', 'Date'])
stringency_df3 = stringency_df
dates = stringency_df3['Date'].unique()
stringency_df4 = pd.DataFrame(columns=dates)
stringency_df4['CountryCode'] = stringency_df['CountryCode'].unique()
stringency_df5 = stringency_df4.set_index(['CountryCode'])
for i in range(147): #151 countries
tmp = np.asarray(stringency_df['StringencyIndex'].iloc[i * 115:115 +
115 * i])
stringency_df5.iloc[i, :] = tmp
The last row of the dataset consisted of NaN values. This country, however, did not have data for the entire date range, so it is okay to remove it here:
For consistency and to compare between datasets, there must be a common date range. We must remove data before the 22/01/20 to be consisten with the cases dataset:
stringency_df5 = stringency_df5.drop([
'1/01/2020', '2/01/2020', '3/01/2020', '4/01/2020', '5/01/2020',
'6/01/2020', '7/01/2020', '8/01/2020', '9/01/2020', '10/01/2020',
'11/01/2020', '12/01/2020', '13/01/2020', '14/01/2020', '15/01/2020',
'16/01/2020', '17/01/2020', '18/01/2020', '19/01/2020', '20/01/2020',
'21/01/2020'
],
axis=1)
Ensure datetime
stringency_df5.columns =pd.to_datetime(stringency_df5.columns, format ='%d/%m/%Y')
Set colours
cscale= [[0, 'rgb(68, 1, 84)'],
[0.33, 'rgb(49, 104, 142)'],
[0.66, 'rgb(109, 205, 89)'],
[1, 'rgb(253, 231, 37)'],]
Create slider and add slider data. Here each entry is stringency index for countries at given date
data_slider = []
for col in stringency_df5.columns:
if(i==len(stringency_df5.columns)-1):
break
#if (col in deaths_data.columns):
# text ='Deaths so far: '+ deaths_data.loc[:, :col].sum().astype(int).astype(str)
#else:
# text= ''
data = dict(
type='choropleth',
colorscale = cscale,
autocolorscale = False,
locations = stringency_df5.index,
z = stringency_df5.loc[:,col].astype(float),
zmax = 100,
zmin = 0,
locationmode = 'ISO-3',
#text = text,
marker = dict(
line = dict(
color = 'rgb(255,255,255)',
width = 1)),
colorbar = dict(
title = "Stringency index")
)
data_slider.append(data)
List of dates with correct format
dates = []
for col in stringency_df5.columns:
dates.append(col.strftime('%b -%d'))
Create steps for slider
steps = []
for i in range(len(data_slider)):
step = dict(method='restyle',
args=['visible', [False] * len(data_slider)],
label= format(dates[i]))
step['args'][1][i] = True
steps.append(step)
sliders = [dict(active=0, pad={"t": 1}, steps=steps)]
Create layout and figure
layout = dict(title= "Stringency Index Over Time",
geo = dict(
projection={'type':'natural earth' }),
sliders=sliders)
fig = dict(data=data_slider, layout=layout)
Show figure
plotly.offline.iplot(fig)
An interesting observation can be made around the first weeks of March, where there is an increase in the stringency index worldwide. When we compare this observation to the summary statistics plot shown at the beginning of the website, we can see that this was around the time when the number of coronavirus cases worldwide began to increase very quickly. Thus explaining the increase in stringency index as governments attempt to stop the spread of the virus.
Now that we have seen the evolution of the stringency index worldwide, it is now interesting to see how this correlates to the number of coronavirus cases. The following visualisation shows the maximum number of cases and the maximum stringency index for each country worldwide. When you hover over a point, you can see which country this corresponds to. A logistic regression trend line is also shown in red. This visualisation is based off the visualisations presented by the University of Oxford [4].
Import the dataset and remove the relevant columns and store in a new dataframe:
stringency = pd.read_csv("../datasets/OxCGRT_latest.csv", )
stringency_df = pd.DataFrame(
stringency,
columns=['CountryName', 'CountryCode', 'Date', 'StringencyIndex'])
Some countries do not have complete data for the date range 01/01/2020 to the 24/04/2020 sthey are removed here:
stringency_df = stringency_df[~stringency_df.CountryCode.str.contains("CPV")]
stringency_df = stringency_df[~stringency_df.CountryCode.str.contains("LSO")]
stringency_df = stringency_df[~stringency_df.CountryCode.str.contains("MAC")]
Initially, the dataset listed the stringency index per day per country as individual rows within the dataset. The dataset was manipulated to have the countries along with the y index of the dataset and the date along with the top index. This allowed for the data to be presented as a large matrix and is easier to plot.
stringency_df.drop(stringency_df.tail(1).index, inplace=True)
stringency_df["StringencyIndex"] = pd.to_numeric(
stringency_df["StringencyIndex"])
stringency_df2 = stringency_df
stringency_df2.set_index(['CountryCode', 'Date'])
stringency_df3 = stringency_df
dates = stringency_df3['Date'].unique()
stringency_df4 = pd.DataFrame(columns=dates)
stringency_df4['CountryCode'] = stringency_df['CountryCode'].unique()
stringency_df5 = stringency_df4.set_index(['CountryCode'])
for i in range(147): #151 countries
tmp = np.asarray(stringency_df['StringencyIndex'].iloc[i * 115:115 +
115 * i])
stringency_df5.iloc[i, :] = tmp
The last row of the dataset consisted of NaN values. This country, however, did not have data for the entire date range, so it is okay to remove it here:
stringency_df5 = stringency_df5.iloc[:147]
For consistency and to compare between datasets, there must be a common date range. We must remove data before the 22/01/20 to be consisten with the cases dataset:
stringency_df5 = stringency_df5.drop([
'1/01/2020', '2/01/2020', '3/01/2020', '4/01/2020', '5/01/2020',
'6/01/2020', '7/01/2020', '8/01/2020', '9/01/2020', '10/01/2020',
'11/01/2020', '12/01/2020', '13/01/2020', '14/01/2020', '15/01/2020',
'16/01/2020', '17/01/2020', '18/01/2020', '19/01/2020', '20/01/2020',
'21/01/2020'
],
axis=1)
Make sure both datasets use datetime objects:
# cases_data = pd.read_csv("cases_by_day.csv", index_col = 0)
stringency_df5.columns = pd.to_datetime(stringency_df5.columns)
cases_data.columns = pd.to_datetime(cases_data.columns)
Remove last three columns so both data sets have the same date range
cases_data = cases_data.iloc[:, :-3]
total_cases = cases_data['2020-04-24']
max_stringency = stringency_df5.max(axis=1)
horizontal_stack = pd.concat([total_cases, max_stringency], axis=1)
horizontal_stack.columns = ['Cases', 'Stringency']
Plot the data:
Show the plot:
show(p)
The clear trend in this visualisation is that as the number of coronavirus cases increases, the maximum stringency level also increases. This is expected as governments begin to introduce stricter measures in an attempt to curb the virus. Majority of countries reached a maximum stringency level of greater than 80, which means that many governments have adopted strict measures effectively shutting down countries to try and stop the spread of coronavirus. What is interesting to now look at is how these high stringency levels have impacted the pandemic curve.
This visualisation looks at how the stringency index has impacted the pandemic curve. The pandemic curve is the same as seen previously; however, this time, the bars have been coloured to represent the stringency index.
show(layout)
WARNING:bokeh.core.validation.check:W-1000 (MISSING_RENDERERS): Plot has no renderers: Figure(id='3251', ...)
As the curve rises to a peak, the stringency index increases with this trend. When reaching a peak, most countries had implemented strict government measures. In most cases, after the peak and the introduction of the strict lockdown measures as shown by the high stringency index, the number of active cases began to decrease. However, some countries have implemented strict measures before a peak is in sight. These countries include India, France, Denmark, Mexico and France. For many of these countries, the number of cases is still rising. This trend is perhaps preemptive of the governments and what is to come. For some countries, there is a clear build-up of the stringency index, showing that the government measures are slowing becoming stricter. However, for other countries, there was not stringency index, then a sudden jump was seen to a high stringency index, for example, France. This suggests governments trying to wait out the implementation lockdown measures but then being overwhelmed by the number of cases, throwing the country into sudden lockdown.
From all of the visualisations, it is clear that the impact of coronavirus has been felt worldwide. When looking at the evolution of the virus, we can see that the rate of spread has increased over the last six weeks. This is also a result of increasingly strict government measures being introduced in the last six weeks, as shown by the stringency index. The exponential growth of the virus is clear and is also seen to have picked up over the last six weeks.
The pandemic curve showed that countries worldwide are at different stages of the pandemic and that this can be attributed to the different dates when the virus began to spread in a particular country. The date that a virus entered a country has also been seen to impact when government measures were introduced. We saw that each time a new measure was introduced, the cases at the time the measure was introduced have increased from the last time the government introduced a measure. This suggested the active monitoring of the pandemic by governments and how they are responding to the growing number of cases with stricter measures in an attempt to curve the spread of the virus.
However, there are often other reasons why different government measures were enacted. GDP and healthcare expenditure had a great influence on the measures introduced. For countries with better healthcare systems; maybe the governments felt better poised to treat patients with the virus and thus waited longer to put the country into lockdown. When looking at the healthcare capacities of the, it is clear that the countries with a greater healthcare capacity waited longer before implementing strict government measures. However, other government factors come into play. Countries whose government effectiveness and the rule of law indicators are high had greater control over the pandemic.
The stringency index, like the number of cases, has increased worldwide over the last six weeks as a result of the increase in cases. From a logistic regression model, the stringency index was seen to increase with the number of cases which, is expected as a result of the stricter government measures. We can also see this when comparing the pandemic curve and stringency indexes. Before reaching the peak of the curve, governments had enforced the strictest measures in their country (there may be differences between countries). In most cases, after the peak and the introduction of the strict lockdown measures as shown by the high stringency index, the number of active cases began to decrease which, shows the positive impact of government measures.
ACAPS. 2020. #COVID19 Government Measures Dataset. [Online]. Available: https://www.acaps.org/covid19-government-measures-dataset. [Accessed 12 April 2020].
Johns Hopkins University. 2020. Coronavirus Resource Center. [Online]. Available: https://coronavirus.jhu.edu/. [Accessed 12 April 2020].
The World Bank. 2019. World Bank Open Data. [Online]. Available: https://data.worldbank.org/. [Accessed 12 April 2020].
University of Oxford. 2020. Coronavirus Government Response Tracker. [Online]. Available: https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker. [Accessed 12 April 2020].